Search | VHL Regional Portal

1.

Identification of bacterial determinants of tuberculosis infection and treatment outcomes: a phenogenomic analysis of clinical strains.

Stanley, Sydney; Spaulding, Caitlin N; Liu, Qingyun; Chase, Michael R; Ha, Dang Thi Minh; Thai, Phan Vuong Khac; Lan, Nguyen Huu; Thu, Do Dang Anh; Quang, Nguyen Le; Brown, Jessica; Hicks, Nathan D; Wang, Xin; Marin, Maximillian; Howard, Nicole C; Vickers, Andrew J; Karpinski, Wiktor M; Chao, Michael C; Farhat, Maha R; Caws, Maxine; Dunstan, Sarah J; Thuong, Nguyen Thuy Thuong; Fortune, Sarah M.

Lancet Microbe ; 2024 May 06.

Article in English | MEDLINE | ID: mdl-38734030

ABSTRACT

BACKGROUND: Bacterial diversity could contribute to the diversity of tuberculosis infection and treatment outcomes observed clinically, but the biological basis of this association is poorly understood. The aim of this study was to identify associations between phenogenomic variation in Mycobacterium tuberculosis and tuberculosis clinical features. METHODS: We developed a high-throughput platform to define phenotype-genotype relationships in M tuberculosis clinical isolates, which we tested on a set of 158 drug-sensitive M tuberculosis strains sampled from a large tuberculosis clinical study in Ho Chi Minh City, Viet Nam. We tagged the strains with unique genetic barcodes in multiplicate, allowing us to pool the strains for in-vitro competitive fitness assays across 16 host-relevant antibiotic and metabolic conditions. Relative fitness was quantified by deep sequencing, enumerating output barcode read counts relative to input normalised values. We performed a genome-wide association study to identify phylogenetically linked and monogenic mutations associated with the in-vitro fitness phenotypes. These genetic determinants were further associated with relevant clinical outcomes (cavitary disease and treatment failure) by calculating odds ratios (ORs) with binomial logistic regressions. We also assessed the population-level transmission of strains associated with cavitary disease and treatment failure using terminal branch length analysis of the phylogenetic data. FINDINGS: M tuberculosis clinical strains had diverse growth characteristics in host-like metabolic and drug conditions. These fitness phenotypes were highly heritable, and we identified monogenic and phylogenetically linked variants associated with the fitness phenotypes. These data enabled us to define two genetic features that were associated with clinical outcomes. First, mutations in Rv1339, a phosphodiesterase, which were associated with slow growth in glycerol, were further associated with treatment failure (OR 5·34, 95% CI 1·21-23·58, p=0·027). Second, we identified a phenotypically distinct slow-growing subclade of lineage 1 strains (L1.1.1.1) that was associated with cavitary disease (OR 2·49, 1·11-5·59, p=0·027) and treatment failure (OR 4·76, 1·53-14·78, p=0·0069), and which had shorter terminal branch lengths on the phylogenetic tree, suggesting increased transmission. INTERPRETATION: Slow growth under various antibiotic and metabolic conditions served as in-vitro intermediate phenotypes underlying the association between M tuberculosis monogenic and phylogenetically linked mutations and outcomes such as cavitary disease, treatment failure, and transmission potential. These data suggest that M tuberculosis growth regulation is an adaptive advantage for bacterial success in human populations, at least in some circumstances. These data further suggest markers for the underlying bacterial processes that contribute to these clinical outcomes. FUNDING: National Health and Medical Research Council/A∗STAR, National Institutes of Allergy and Infectious Diseases, National Institute of Child Health and Human Development, and the Wellcome Trust Fellowship in Public Health and Tropical Medicine.

2.

Diverse and abundant phages exploit conjugative plasmids.

Quinones-Olvera, Natalia; Owen, Siân V; McCully, Lucy M; Marin, Maximillian G; Rand, Eleanor A; Fan, Alice C; Martins Dosumu, Oluremi J; Paul, Kay; Sanchez Castaño, Cleotilde E; Petherbridge, Rachel; Paull, Jillian S; Baym, Michael.

Nat Commun ; 15(1): 3197, 2024 Apr 12.

Article in English | MEDLINE | ID: mdl-38609370

ABSTRACT

Phages exert profound evolutionary pressure on bacteria by interacting with receptors on the cell surface to initiate infection. While the majority of phages use chromosomally encoded cell surface structures as receptors, plasmid-dependent phages exploit plasmid-encoded conjugation proteins, making their host range dependent on horizontal transfer of the plasmid. Despite their unique biology and biotechnological significance, only a small number of plasmid-dependent phages have been characterized. Here we systematically search for new plasmid-dependent phages targeting IncP and IncF plasmids using a targeted discovery platform, and find that they are common and abundant in wastewater, and largely unexplored in terms of their genetic diversity. Plasmid-dependent phages are enriched in non-canonical types of phages, and all but one of the 65 phages we isolated were non-tailed, and members of the lipid-containing tectiviruses, ssDNA filamentous phages or ssRNA phages. We show that plasmid-dependent tectiviruses exhibit profound differences in their host range which is associated with variation in the phage holin protein. Despite their relatively high abundance in wastewater, plasmid-dependent tectiviruses are missed by metaviromic analyses, underscoring the continued importance of culture-based phage discovery. Finally, we identify a tailed phage dependent on the IncF plasmid, and find related structural genes in phages that use the orthogonal type 4 pilus as a receptor, highlighting the evolutionarily promiscuous use of these distinct contractile structures by multiple groups of phages. Taken together, these results indicate plasmid-dependent phages play an under-appreciated evolutionary role in constraining horizontal gene transfer via conjugative plasmids.

Subject(s)

Bacteriophages , Bacteriophages/genetics , Wastewater , Biological Evolution , Biotechnology , Cell Membrane

3.

Analysis of the limited M. tuberculosis accessory genome reveals potential pitfalls of pan-genome analysis approaches.

Marin, Maximillian G; Wippel, Christoph; Quinones-Olvera, Natalia; Behruznia, Mahboobeh; Jeffrey, Brendan M; Harris, Michael; Mann, Brendon C; Rosenthal, Alex; Jacobson, Karen R; Warren, Robin M; Li, Heng; Meehan, Conor J; Farhat, Maha R.

bioRxiv ; 2024 May 04.

Article in English | MEDLINE | ID: mdl-38585972

ABSTRACT

Pan-genome analysis is a fundamental tool for studying bacterial genome evolution; however, the variety of methods used to define and measure the pan-genome poses challenges to the interpretation and reliability of results. To quantify sources of bias and error related to common pan-genome analysis approaches, we evaluated different approaches applied to curated collection of 151 Mycobacterium tuberculosis ( Mtb ) isolates. Mtb is characterized by its clonal evolution, absence of horizontal gene transfer, and limited accessory genome, making it an ideal test case for this study. Using a state-of-the-art graph-genome approach, we found that a majority of the structural variation observed in Mtb originates from rearrangement, deletion, and duplication of redundant nucleotide sequences. In contrast, we found that pan-genome analyses that focus on comparison of coding sequences (at the amino acid level) can yield surprisingly variable results, driven by differences in assembly quality and the softwares used. Upon closer inspection, we found that coding sequence annotation discrepancies were a major contributor to inflated Mtb accessory genome estimates. To address this, we developed panqc, a software that detects annotation discrepancies and collapses nucleotide redundancy in pan-genome estimates. When applied to Mtb and E. coli pan-genomes, panqc exposed distinct biases influenced by the genomic diversity of the population studied. Our findings underscore the need for careful methodological selection and quality control to accurately map the evolutionary dynamics of a bacterial species.

4.

Exploring gene content with pangenome gene graphs.

Li, Heng; Marin, Maximillian; Farhat, Maha Reda.

ArXiv ; 2024 Feb 27.

Article in English | MEDLINE | ID: mdl-38463499

ABSTRACT

Motivation: The gene content regulates the biology of an organism. It varies between species and between individuals of the same species. Although tools have been developed to identify gene content changes in bacterial genomes, none is applicable to collections of large eukaryotic genomes such as the human pangenome. Results: We developed pangene, a computational tool to identify gene orientation, gene order and gene copy-number changes in a collection of genomes. Pangene aligns a set of input protein sequences to the genomes, resolves redundancies between protein sequences and constructs a gene graph with each genome represented as a walk in the graph. It additionally finds subgraphs that encodes gene content changes. Applied to the human pangenome, pangene identifies known gene-level variations and reveals complex haplotypes that are not well studied before. Pangene also works with high-quality bacterial pangenome and reports similar numbers of core and accessory genes in comparison to existing tools. Availability and implementation: Source code at https://github.com/lh3/pangene; pre-built pangene graphs can be downloaded from https://zenodo.org/records/8118576 and visualized at https://pangene.bioinweb.org.

5.

Evaluating generalizability of artificial intelligence models for molecular datasets.

Ektefaie, Yasha; Shen, Andrew; Bykova, Daria; Marin, Maximillian; Zitnik, Marinka; Farhat, Maha.

bioRxiv ; 2024 Feb 28.

Article in English | MEDLINE | ID: mdl-38464295

ABSTRACT

Deep learning has made rapid advances in modeling molecular sequencing data. Despite achieving high performance on benchmarks, it remains unclear to what extent deep learning models learn general principles and generalize to previously unseen sequences. Benchmarks traditionally interrogate model generalizability by generating metadata based (MB) or sequence-similarity based (SB) train and test splits of input data before assessing model performance. Here, we show that this approach mischaracterizes model generalizability by failing to consider the full spectrum of cross-split overlap, i.e., similarity between train and test splits. We introduce Spectra, a spectral framework for comprehensive model evaluation. For a given model and input data, Spectra plots model performance as a function of decreasing cross-split overlap and reports the area under this curve as a measure of generalizability. We apply Spectra to 18 sequencing datasets with associated phenotypes ranging from antibiotic resistance in tuberculosis to protein-ligand binding to evaluate the generalizability of 19 state-of-the-art deep learning models, including large language models, graph neural networks, diffusion models, and convolutional neural networks. We show that SB and MB splits provide an incomplete assessment of model generalizability. With Spectra, we find as cross-split overlap decreases, deep learning models consistently exhibit a reduction in performance in a task- and model-dependent manner. Although no model consistently achieved the highest performance across all tasks, we show that deep learning models can generalize to previously unseen sequences on specific tasks. Spectra paves the way toward a better understanding of how foundation models generalize in biology.

6.

Phase variation as a major mechanism of adaptation in Mycobacterium tuberculosis complex.

Vargas, Roger; Luna, Michael J; Freschi, Luca; Marin, Maximillian; Froom, Ruby; Murphy, Kenan C; Campbell, Elizabeth A; Ioerger, Thomas R; Sassetti, Christopher M; Farhat, Maha Reda.

Proc Natl Acad Sci U S A ; 120(28): e2301394120, 2023 07 11.

Article in English | MEDLINE | ID: mdl-37399390

ABSTRACT

Phase variation induced by insertions and deletions (INDELs) in genomic homopolymeric tracts (HT) can silence and regulate genes in pathogenic bacteria, but this process is not characterized in MTBC (Mycobacterium tuberculosis complex) adaptation. We leverage 31,428 diverse clinical isolates to identify genomic regions including phase-variants under positive selection. Of 87,651 INDEL events that emerge repeatedly across the phylogeny, 12.4% are phase-variants within HTs (0.02% of the genome by length). We estimated the in-vitro frameshift rate in a neutral HT at 100× the neutral substitution rate at [Formula: see text] frameshifts/HT/year. Using neutral evolution simulations, we identified 4,098 substitutions and 45 phase-variants to be putatively adaptive to MTBC (P < 0.002). We experimentally confirm that a putatively adaptive phase-variant alters the expression of espA, a critical mediator of ESX-1-dependent virulence. Our evidence supports the hypothesis that phase variation in the ESX-1 system of MTBC can act as a toggle between antigenicity and survival in the host.

Subject(s)

Mycobacterium tuberculosis , Mycobacterium tuberculosis/genetics , Phase Variation , Genomics , Adaptation, Physiological/genetics , Virulence/genetics , Phylogeny , Genome, Bacterial

7.

Full-length transcript alterations in human bronchial epithelial cells with U2AF1 S34F mutations.

Soulette, Cameron M; Hrabeta-Robinson, Eva; Arevalo, Carlos; Felton, Colette; Tang, Alison D; Marin, Maximillian G; Brooks, Angela N.

Life Sci Alliance ; 6(10)2023 10.

Article in English | MEDLINE | ID: mdl-37487637

ABSTRACT

U2AF1 is one of the most recurrently mutated splicing factors in lung adenocarcinoma and has been shown to cause transcriptome-wide pre-mRNA splicing alterations; however, the full-length altered mRNA isoforms associated with the mutation are largely unknown. To better understand the impact U2AF1 has on full-length isoform fate and function, we conducted high-throughput long-read cDNA sequencing from isogenic human bronchial epithelial cells with and without a U2AF1 S34F mutation. We identified 49,366 multi-exon transcript isoforms, more than half of which did not match GENCODE or short-read-assembled isoforms. We found 198 transcript isoforms with significant expression and usage changes relative to WT, only 68% of which were assembled by short reads. Expression of isoforms from immune-related genes is largely down-regulated in mutant cells and without observed splicing changes. Finally, we reveal that isoforms likely targeted by nonsense-mediated decay are down-regulated in U2AF1 S34F cells, suggesting that isoform changes may alter the translational output of those affected genes. Altogether, our work provides a resource of full-length isoforms associated with U2AF1 S34F in lung cells.

Subject(s)

Epithelial Cells , RNA Splicing , Humans , Splicing Factor U2AF/genetics , Splicing Factor U2AF/metabolism , RNA Splicing/genetics , Protein Isoforms/genetics , Protein Isoforms/metabolism , Epithelial Cells/metabolism , Mutation/genetics

8.

Analysis of Genome-Wide Mutational Dependence in Naturally Evolving Mycobacterium tuberculosis Populations.

Green, Anna G; Vargas, Roger; Marin, Maximillian G; Freschi, Luca; Xie, Jiaqi; Farhat, Maha R.

Mol Biol Evol ; 40(6)2023 06 01.

Article in English | MEDLINE | ID: mdl-37352142

ABSTRACT

Pathogenic microorganisms are in a perpetual struggle for survival in changing host environments, where host pressures necessitate changes in pathogen virulence, antibiotic resistance, or transmissibility. The genetic basis of phenotypic adaptation by pathogens is difficult to study in vivo. In this work, we develop a phylogenetic method to detect genetic dependencies that promote pathogen adaptation using 31,428 in vivo sampled Mycobacterium tuberculosis genomes, a globally prevalent bacterial pathogen with increasing levels of antibiotic resistance. We find that dependencies between mutations are enriched in antigenic and antibiotic resistance functions and discover 23 mutations that potentiate the development of antibiotic resistance. Between 11% and 92% of resistant strains harbor a dependent mutation acquired after a resistance-conferring variant. We demonstrate the pervasiveness of genetic dependency in adaptation of naturally evolving populations and the utility of the proposed computational approach.

Subject(s)

Mycobacterium tuberculosis , Mycobacterium tuberculosis/genetics , Antitubercular Agents/therapeutic use , Phylogeny , Mutation , Virulence , Microbial Sensitivity Tests

9.

High-throughput phenogenotyping of Mycobacteria tuberculosis clinical strains reveals bacterial determinants of treatment outcomes.

Stanley, Sydney; Spaulding, Caitlin N; Liu, Qingyun; Chase, Michael R; Ha, Dang Thi Minh; Thai, Phan Vuong Khac; Lan, Nguyen Huu; Thu, Do Dang Anh; Quang, Nguyen Le; Brown, Jessica; Hicks, Nathan D; Wang, Xin; Marin, Maximillian; Howard, Nicole C; Vickers, Andrew J; Karpinski, Wiktor M; Chao, Michael C; Farhat, Maha R; Caws, Maxine; Dunstan, Sarah J; Thuong, Nguyen Thuy Thuong; Fortune, Sarah M.

bioRxiv ; 2023 Apr 10.

Article in English | MEDLINE | ID: mdl-37090677

ABSTRACT

Background: Combatting the tuberculosis (TB) epidemic caused by Mycobacterium tuberculosis ( Mtb ) necessitates a better understanding of the factors contributing to patient clinical outcomes and transmission. While host and environmental factors have been evaluated, the impact of Mtb genetic background and phenotypic diversity is underexplored. Previous work has made associations between Mtb genetic lineages and some clinical and epidemiological features, but the bacterial traits underlying these connections are largely unknown. Methods: We developed a high-throughput functional genomics platform for defining genotype-phenotype relationships across a panel of Mtb clinical isolates. These phenotypic fitness profiles function as intermediate traits which can be linked to Mtb genetic variants and associated with clinical and epidemiological outcomes. We applied this approach to a collection of 158 Mtb strains from a study of Mtb transmission in Ho Chi Minh City, Vietnam. Mtb strains were genetically tagged in multiplicate, which allowed us to pool the strains and assess in vitro competitive fitness using deep sequencing across a set of 14 host-relevant antibiotic and metabolic conditions. Phylogenetic and monogenic associations with these intermediate traits were identified and then associated with clinical outcomes. Findings: Mtb clinical strains have a broad range of growth and drug response dynamics that can be clustered by their phylogenetic relationships. We identified novel monogenic associations with Mtb fitness in various metabolic and antibiotic conditions. Among these, we find that mutations in Rv1339 , a phosphodiesterase, which were identified through their association with slow growth in glycerol, are further associated with treatment failure. We also identify a previously uncharacterized subclade of Lineage 1 strains (L1.1.1.1) that is phenotypically distinguished by slow growth under most antibiotic and metabolic stress conditions in vitro . This clade is associated with cavitary disease, treatment failure, and demonstrates increased transmission potential. Interpretation: High-throughput phenogenotyping of Mtb clinical strains enabled bacterial intermediate trait identification that can provide a mechanistic link between Mtb genetic variation and patient clinical outcomes. Mtb strains associated with cavitary disease, treatment failure, and transmission potential display intermediate phenotypes distinguished by slow growth under various antibiotic and metabolic conditions. These data suggest that Mtb growth regulation is an adaptive advantage for host bacterial success in human populations, in at least some circumstances. These data further suggest markers for the underlying bacterial processes that govern these clinical outcomes. Funding: National Institutes of Allergy and Infectious Diseases: P01 AI132130 (SS, SMF); P01 AI143575 (XW, SMF); U19 AI142793 (QL, SMF); 5T32AI132120-03 (SS); 5T32AI132120-04 (SS); 5T32AI049928-17 (SS) Wellcome Trust Fellowship in Public Health and Tropical Medicine: 097124/Z/11/Z (NTTT) National Health and Medical Research Council (NHMRC)/A*STAR joint call: APP1056689 (SJD) The funding sources had no involvement in study methodology, data collection, analysis, and interpretation nor in the writing or submission of the manuscript. Research in context: Evidence before this study: We used different combinations of the words mycobacterium tuberculosis, tuberculosis, clinical strains, intermediate phenotypes, genetic barcoding, phenogenomics, cavitary disease, treatment failure, and transmission to search the PubMed database for all studies published up until January 20 th , 2022. We only considered English language publications, which biases our search. Previous work linking Mtb determinants to clinical or epidemiological data has made associations between bacterial lineage, or less frequently, genetic polymorphisms to in vitro or in vivo models of pathogenesis, transmission, and clinical outcomes such as cavitary disease, treatment failure, delayed culture conversion, and severity. Many of these studies focus on the global pandemic Lineage 2 and Lineage 4 Mtb strains due in part to a deletion in a polyketide synthase implicated in host-pathogen interactions. There are a number of Mtb GWAS studies that have led to novel genetic determinants of in vitro drug resistance and tolerance. Previous Mtb GWAS analyses with clinical outcomes did not experimentally test any predicted phenotypes of the clinical strains. Published laboratory-based studies of Mtb clinical strains involve relatively small numbers of strains, do not identify the genetic basis of relevant phenotypes, or link findings to the corresponding clinical outcomes. There are two recent studies of other pathogens that describe phenogenomic analyses. One study of 331 M. abscessus clinical strains performed one-by-one phenotyping to identify bacterial features associated with clearance of infection and another details a competition experiment utilizing three barcoded Plasmodium falciparum clinical isolates to assay antimalarial fitness and resistance. Added value of this study: We developed a functional genomics platform to perform high-throughput phenotyping of Mtb clinical strains. We then used these phenotypes as intermediate traits to identify novel bacterial genetic features associated with clinical outcomes. We leveraged this platform with a sample of 158 Mtb clinical strains from a cross sectional study of Mtb transmission in Ho Chi Minh City, Vietnam. To enable high-throughput phenotyping of large numbers of Mtb clinical isolates, we applied a DNA barcoding approach that has not been previously utilized for the high-throughput analysis of Mtb clinical strains. This approach allowed us to perform pooled competitive fitness assays, tracking strain fitness using deep sequencing. We measured the replicative fitness of the clinical strains in multiplicate under 14 metabolic and antibiotic stress condition. To our knowledge, this is the largest phenotypic screen of Mtb clinical isolates to date. We performed bacterial GWAS to delineate the Mtb genetic variants associated with each fitness phenotype, identifying monogenic associations with several conditions. We then defined Mtb phenotypic and genetic features associated with clinical outcomes. We find that a subclade of Mtb strains, defined by variants largely involved in fatty acid metabolic pathways, share a universal slow growth phenotype that is associated with cavitary disease, treatment failure and increased transmission potential in Vietnam. We also find that mutations in Rv1339 , a poorly characterized phosphodiesterase, also associate with slow growth in vitro and with treatment failure in patients. Implications of all the available evidence: Phenogenomic profiling demonstrates that Mtb strains exhibit distinct growth characteristics under metabolic and antibiotic stress conditions. These fitness profiles can serve as intermediate traits for GWAS and association with clinical outcomes. Intermediate phenotyping allows us to examine potential processes by which bacterial strain differences contribute to clinical outcomes. Our study identifies clinical strains with slow growth phenotypes under in vitro models of antibiotic and host-like metabolic conditions that are associated with adverse clinical outcomes. It is possible that the bacterial intermediate phenotypes we identified are directly related to the mechanisms of these outcomes, or they may serve as markers for the causal yet unidentified bacterial determinants. Via the intermediate phenotyping, we also discovered a surprising diversity in Mtb responses to the new anti-mycobacterial drugs that target central metabolic processes, which will be important in considering roll-out of these new agents. Our study and others that have identified Mtb determinants of TB clinical and epidemiological phenotypes should inform efforts to improve diagnostics and drug regimen design.

10.

Diverse and abundant phages exploit conjugative plasmids.

Quinones-Olvera, Natalia; Owen, Siân V; McCully, Lucy M; Marin, Maximillian G; Rand, Eleanor A; Fan, Alice C; Martins Dosumu, Oluremi J; Paul, Kay; Sanchez Castaño, Cleotilde E; Petherbridge, Rachel; Paull, Jillian S; Baym, Michael.

bioRxiv ; 2023 Dec 21.

Article in English | MEDLINE | ID: mdl-36993299

ABSTRACT

Phages exert profound evolutionary pressure on bacteria by interacting with receptors on the cell surface to initiate infection. While the majority of phages use chromosomally-encoded cell surface structures as receptors, plasmid-dependent phages exploit plasmid-encoded conjugation proteins, making their host range dependent on horizontal transfer of the plasmid. Despite their unique biology and biotechnological significance, only a small number of plasmid-dependent phages have been characterized. Here we systematically search for new plasmid-dependent phages targeting IncP and IncF plasmids using a targeted discovery platform, and find that they are common and abundant in wastewater, and largely unexplored in terms of their genetic diversity. Plasmid-dependent phages are enriched in non-canonical types of phages, and all but one of the 64 phages we isolated were non-tailed, and members of the lipid-containing tectiviruses, ssDNA filamentous phages or ssRNA phages. We show that plasmid-dependent tectiviruses exhibit profound differences in their host range which is associated with variation in the phage holin protein. Despite their relatively high abundance in wastewater, plasmid-dependent tectiviruses are missed by metaviromic analyses, underscoring the continued importance of culture-based phage discovery. Finally, we identify a tailed phage dependent on the IncF plasmid, and find related structural genes in phages that use the orthogonal type 4 pilus as a receptor, highlighting the promiscuous use of these distinct contractile structures by multiple groups of phages. Taken together, these results indicate plasmid-dependent phages play an under-appreciated evolutionary role in constraining horizontal gene transfer via conjugative plasmids.

11.

Author Correction: Genomic basis for RNA alterations in cancer.

Calabrese, Claudia; Davidson, Natalie R; Demircioglu, Deniz; Fonseca, Nuno A; He, Yao; Kahles, André; Lehmann, Kjong-Van; Liu, Fenglin; Shiraishi, Yuichi; Soulette, Cameron M; Urban, Lara; Greger, Liliana; Li, Siliang; Liu, Dongbing; Perry, Marc D; Xiang, Qian; Zhang, Fan; Zhang, Junjun; Bailey, Peter; Erkek, Serap; Hoadley, Katherine A; Hou, Yong; Huska, Matthew R; Kilpinen, Helena; Korbel, Jan O; Marin, Maximillian G; Markowski, Julia; Nandi, Tannistha; Pan-Hammarström, Qiang; Pedamallu, Chandra Sekhar; Siebert, Reiner; Stark, Stefan G; Su, Hong; Tan, Patrick; Waszak, Sebastian M; Yung, Christina; Zhu, Shida; Awadalla, Philip; Creighton, Chad J; Meyerson, Matthew; Ouellette, B F Francis; Wu, Kui; Yang, Huanming; Brazma, Alvis; Brooks, Angela N; Göke, Jonathan; Rätsch, Gunnar; Schwarz, Roland F; Stegle, Oliver; Zhang, Zemin.

Nature ; 614(7948): E37, 2023 Feb.

Article in English | MEDLINE | ID: mdl-36697831

12.

Benchmarking the empirical accuracy of short-read sequencing across the M. tuberculosis genome.

Marin, Maximillian; Vargas, Roger; Harris, Michael; Jeffrey, Brendan; Epperson, L Elaine; Durbin, David; Strong, Michael; Salfinger, Max; Iqbal, Zamin; Akhundova, Irada; Vashakidze, Sergo; Crudu, Valeriu; Rosenthal, Alex; Farhat, Maha Reda.

Bioinformatics ; 38(7): 1781-1787, 2022 03 28.

Article in English | MEDLINE | ID: mdl-35020793

ABSTRACT

MOTIVATION: Short-read whole-genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences and sequencing bias reduces the performance of variant calling using short-read alignment, but the loss in recall and specificity has not been adequately characterized. To benchmark short-read variant calling, we used 36 diverse clinical Mycobacterium tuberculosis (Mtb) isolates dually sequenced with Illumina short-reads and PacBio long-reads. We systematically studied the short-read variant calling accuracy and the influence of sequence uniqueness, reference bias and GC content. RESULTS: Reference-based Illumina variant calling demonstrated a maximum recall of 89.0% and minimum precision of 98.5% across parameters evaluated. The approach that maximized variant recall while still maintaining high precision (<99%) was tuning the mapping quality filtering threshold, i.e. confidence of the read mapping (recall = 85.8%, precision = 99.1%, MQ ≥ 40). Additional masking of repetitive sequence content is an alternative conservative approach to variant calling that increases precision at cost to recall (recall = 70.2%, precision = 99.6%, MQ ≥ 40). Of the genomic positions typically excluded for Mtb, 68% are accurately called using Illumina WGS including 52/168 PE/PPE genes (34.5%). From these results, we present a refined list of low confidence regions across the Mtb genome, which we found to frequently overlap with regions with structural variation, low sequence uniqueness and low sequencing coverage. Our benchmarking results have broad implications for the use of WGS in the study of Mtb biology, inference of transmission in public health surveillance systems and more generally for WGS applications in other organisms. AVAILABILITY AND IMPLEMENTATION: All relevant code is available at https://github.com/farhat-lab/mtb-illumina-wgs-evaluation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Mycobacterium tuberculosis , Tuberculosis , Humans , Benchmarking , Mycobacterium tuberculosis/genetics , Software , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods

13.

An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates.

Mc Cartney, Ann M; Mahmoud, Medhat; Jochum, Michael; Agustinho, Daniel Paiva; Zorman, Barry; Al Khleifat, Ahmad; Dabbaghie, Fawaz; K Kesharwani, Rupesh; Smolka, Moritz; Dawood, Moez; Albin, Dreycey; Aliyev, Elbay; Almabrazi, Hakeem; Arslan, Ahmed; Balaji, Advait; Behera, Sairam; Billingsley, Kimberley; L Cameron, Daniel; Daw, Joyjit; T Dawson, Eric; De Coster, Wouter; Du, Haowei; Dunn, Christopher; Esteban, Rocio; Jolly, Angad; Kalra, Divya; Liao, Chunxiao; Liu, Yunxi; Lu, Tsung-Yu; M Havrilla, James; M Khayat, Michael; Marin, Maximillian; Monlong, Jean; Price, Stephen; Rafael Gener, Alejandro; Ren, Jingwen; Sagayaradj, Sagayamary; Sapoval, Nicolae; Sinner, Claude; C Soto, Daniela; Soylev, Arda; Subramaniyan, Arun; Syed, Najeeb; Tadimeti, Neha; Tater, Pamella; Vats, Pankaj; Vaughn, Justin; Walker, Kimberly; Wang, Gaojianyong; Zeng, Qiandong.

F1000Res ; 10: 246, 2021.

Article in English | MEDLINE | ID: mdl-34621504

ABSTRACT

In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.

Subject(s)

COVID-19 , SARS-CoV-2 , Animals , Genome, Viral , Humans , Vertebrates

14.

In-host population dynamics of Mycobacterium tuberculosis complex during active disease.

Vargas, Roger; Freschi, Luca; Marin, Maximillian; Epperson, L Elaine; Smith, Melissa; Oussenko, Irina; Durbin, David; Strong, Michael; Salfinger, Max; Farhat, Maha Reda.

Elife ; 102021 02 01.

Article in English | MEDLINE | ID: mdl-33522489

ABSTRACT

Tuberculosis (TB) is a leading cause of death globally. Understanding the population dynamics of TB's causative agent Mycobacterium tuberculosis complex (Mtbc) in-host is vital for understanding the efficacy of antibiotic treatment. We use longitudinally collected clinical Mtbc isolates that underwent Whole-Genome Sequencing from the sputa of 200 patients to investigate Mtbc diversity during the course of active TB disease after excluding 107 cases suspected of reinfection, mixed infection or contamination. Of the 178/200 patients with persistent clonal infection >2 months, 27 developed new resistance mutations between sampling with 20/27 occurring in patients with pre-existing resistance. Low abundance resistance variants at a purity of ≥19% in the first isolate predict fixation in the subsequent sample. We identify significant in-host variation in 27 genes, including antibiotic resistance genes, metabolic genes and genes known to modulate host innate immunity and confirm several to be under positive selection by assessing phylogenetic convergence across a genetically diverse sample of 20,352 isolates.

Subject(s)

Immunity, Innate/genetics , Mycobacterium tuberculosis/genetics , Tuberculosis/microbiology , Drug Resistance, Bacterial/genetics , Genetics, Population , Humans , Mycobacterium tuberculosis/immunology , Mycobacterium tuberculosis/metabolism , Phylogeny , Polymorphism, Single Nucleotide , Reinfection/microbiology , Sputum/microbiology , Treatment Failure , Tuberculosis/drug therapy , Whole Genome Sequencing

15.

Reintroduction of the archaic variant of NOVA1 in cortical organoids alters neurodevelopment.

Trujillo, Cleber A; Rice, Edward S; Schaefer, Nathan K; Chaim, Isaac A; Wheeler, Emily C; Madrigal, Assael A; Buchanan, Justin; Preissl, Sebastian; Wang, Allen; Negraes, Priscilla D; Szeto, Ryan A; Herai, Roberto H; Huseynov, Alik; Ferraz, Mariana S A; Borges, Fernando S; Kihara, Alexandre H; Byrne, Ashley; Marin, Maximillian; Vollmers, Christopher; Brooks, Angela N; Lautz, Jonathan D; Semendeferi, Katerina; Shapiro, Beth; Yeo, Gene W; Smith, Stephen E P; Green, Richard E; Muotri, Alysson R.

Science ; 371(6530)2021 02 12.

Article in English | MEDLINE | ID: mdl-33574182

ABSTRACT

The evolutionarily conserved splicing regulator neuro-oncological ventral antigen 1 (NOVA1) plays a key role in neural development and function. NOVA1 also includes a protein-coding difference between the modern human genome and Neanderthal and Denisovan genomes. To investigate the functional importance of an amino acid change in humans, we reintroduced the archaic allele into human induced pluripotent cells using genome editing and then followed their neural development through cortical organoids. This modification promoted slower development and higher surface complexity in cortical organoids with the archaic version of NOVA1 Moreover, levels of synaptic markers and synaptic protein coassociations correlated with altered electrophysiological properties in organoids expressing the archaic variant. Our results suggest that the human-specific substitution in NOVA1, which is exclusive to modern humans since divergence from Neanderthals, may have had functional consequences for our species' evolution.

Subject(s)

Cerebral Cortex/growth & development , Cerebral Cortex/physiology , Neanderthals/genetics , Neurons/physiology , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism , Alleles , Alternative Splicing , Amino Acid Substitution , Animals , Binding Sites , Biological Evolution , CRISPR-Cas Systems , Cell Proliferation , Cerebral Cortex/cytology , Gene Expression Regulation, Developmental , Genetic Variation , Genome , Genome, Human , Haplotypes , Hominidae/genetics , Humans , Induced Pluripotent Stem Cells , Nerve Net/physiology , Nerve Tissue Proteins/genetics , Nerve Tissue Proteins/metabolism , Neuro-Oncological Ventral Antigen , Organoids , Synapses/physiology

16.

Genomic basis for RNA alterations in cancer.

Calabrese, Claudia; Davidson, Natalie R; Demircioglu, Deniz; Fonseca, Nuno A; He, Yao; Kahles, André; Lehmann, Kjong-Van; Liu, Fenglin; Shiraishi, Yuichi; Soulette, Cameron M; Urban, Lara; Greger, Liliana; Li, Siliang; Liu, Dongbing; Perry, Marc D; Xiang, Qian; Zhang, Fan; Zhang, Junjun; Bailey, Peter; Erkek, Serap; Hoadley, Katherine A; Hou, Yong; Huska, Matthew R; Kilpinen, Helena; Korbel, Jan O; Marin, Maximillian G; Markowski, Julia; Nandi, Tannistha; Pan-Hammarström, Qiang; Pedamallu, Chandra Sekhar; Siebert, Reiner; Stark, Stefan G; Su, Hong; Tan, Patrick; Waszak, Sebastian M; Yung, Christina; Zhu, Shida; Awadalla, Philip; Creighton, Chad J; Meyerson, Matthew; Ouellette, B F Francis; Wu, Kui; Yang, Huanming; Brazma, Alvis; Brooks, Angela N; Göke, Jonathan; Rätsch, Gunnar; Schwarz, Roland F; Stegle, Oliver; Zhang, Zemin.

Nature ; 578(7793): 129-136, 2020 02.

Article in English | MEDLINE | ID: mdl-32025019

ABSTRACT

Transcript alterations often result from somatic changes in cancer genomes1. Various forms of RNA alterations have been described in cancer, including overexpression2, altered splicing3 and gene fusions4; however, it is difficult to attribute these to underlying genomic changes owing to heterogeneity among patients and tumour types, and the relatively small cohorts of patients for whom samples have been analysed by both transcriptome and whole-genome sequencing. Here we present, to our knowledge, the most comprehensive catalogue of cancer-associated gene alterations to date, obtained by characterizing tumour transcriptomes from 1,188 donors of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA)5. Using matched whole-genome sequencing data, we associated several categories of RNA alterations with germline and somatic DNA alterations, and identified probable genetic mechanisms. Somatic copy-number alterations were the major drivers of variations in total gene and allele-specific expression. We identified 649 associations of somatic single-nucleotide variants with gene expression in cis, of which 68.4% involved associations with flanking non-coding regions of the gene. We found 1,900 splicing alterations associated with somatic mutations, including the formation of exons within introns in proximity to Alu elements. In addition, 82% of gene fusions were associated with structural variants, including 75 of a new class, termed 'bridged' fusions, in which a third genomic location bridges two genes. We observed transcriptomic alteration signatures that differ between cancer types and have associations with variations in DNA mutational signatures. This compendium of RNA alterations in the genomic context provides a rich resource for identifying genes and mechanisms that are functionally implicated in cancer.

Subject(s)

Gene Expression Regulation, Neoplastic , Neoplasms/genetics , RNA/genetics , DNA Copy Number Variations , DNA, Neoplasm , Genome, Human , Genomics , Humans , Transcriptome

17.

A Massively Parallel Fluorescence Assay to Characterize the Effects of Synonymous Mutations on TP53 Expression.

Bhagavatula, Geetha; Rich, Matthew S; Young, David L; Marin, Maximillian; Fields, Stanley.

Mol Cancer Res ; 15(10): 1301-1307, 2017 10.

Article in English | MEDLINE | ID: mdl-28652265

ABSTRACT

Although synonymous mutations can affect gene expression, they have generally not been considered in genomic studies that focus on mutations that increase the risk of cancer. However, mounting evidence implicates some synonymous mutations as driver mutations in cancer. Here, a massively parallel assay, based on cell sorting of a reporter containing a segment of p53 fused to GFP, was used to measure the effects of nearly all synonymous mutations in exon 6 of TP53 In this reporter context, several mutations within the exon caused strong expression changes including mutations that may cause potential gain or loss of function. Further analysis indicates that these effects are largely attributed to errors in splicing, including exon skipping, intron inclusion, and exon truncation, resulting from mutations both at exon-intron junctions and within the body of the exon. These mutations are found at extremely low frequencies in healthy populations and are enriched a few-fold in cancer genomes, suggesting that some of them may be driver mutations in TP53 This assay provides a general framework to identify previously unknown detrimental synonymous mutations in cancer genes.Implications: Using a massively parallel assay, this study demonstrates that synonymous mutations in the TP53 gene affect protein expression, largely through their impact on splicing.Visual Overview: http://mcr.aacrjournals.org/content/molcanres/15/10/1301/F1.large.jpg Mol Cancer Res; 15(10); 1301-7. ©2017 AACR.

Subject(s)

Cell Separation/methods , Flow Cytometry/methods , Sequence Analysis/methods , Silent Mutation , Tumor Suppressor Protein p53/genetics , Cell Line , Exons , Green Fluorescent Proteins/metabolism , Humans , Sequence Analysis, DNA , Sequence Analysis, RNA

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL